1
From Statistical Likelihood to Convex Programs
MATH008 Lesson 7
00:00

Statistical inference asks: "Given this data, what are the most likely underlying parameters?" This slide bridges that question with Convex Optimization. We transform the probabilistic notion of likelihood into a structured program, showing that under conditions of log-concavity, finding the best estimate is equivalent to solving a convex optimization problem.

The Likelihood Framework

The likelihood function is the probability distribution $p_x(y)$ considered as a function of the parameter $x$ for a fixed observed sample $y$. To estimate $x$, we employ Maximum Likelihood (ML) estimation: choosing the value that makes the observed data most probable.

$$\hat{x}_{ml} = \text{argmax}_x p_x(y) = \text{argmax}_x l(x)$$

For computational efficiency, we use the log-likelihood function, $l(x) = \log p_x(y)$. Because the logarithm is a monotonically increasing function, it preserves the location of the maximum while turning products (from independent observations) into easy-to-manage sums.

The MLE Optimization Program (7.1)

We formalize the estimation as a mathematical program:

$$\begin{array}{ll} \text{maximize} & l(x) = \log p_x(y) \\ \text{subject to} & x \in C \end{array}$$ (7.1)

This program is a convex optimization problem if:

  • The log-likelihood function $l$ is concave for each value of $y$.
  • The feasible set $C$ (prior information) is described by linear equality and convex inequality constraints.

Integrating Constraints and Priors

ML estimation requires redefining $p_x(y)$ to be zero for $x \notin C$ to explicitly impose physical or prior constraints. In the optimization space, this means the log-likelihood function is assigned the value $-\infty$ for parameters $x$ that violate these constraints, effectively creating an impassable barrier for the optimizer.

🎯 Core Principle
The transition from "Maximum Likelihood" to "Convex Program" relies on the concavity of the log-density. If the noise or distribution is log-concave, statistical estimation becomes a globally solvable optimization task.